[code_review] Misc improvements part 4 by suhaibmujahid · Pull Request #5588 · mozilla/bugbug

suhaibmujahid · 2026-01-05T02:51:41Z

These improvements could be reviewed commit by commit.

We can have better tracking with W&B Weave

…mponents

Copilot

Pull request overview

This PR refactors the code review evaluation infrastructure by replacing old script-based evaluation tools with a more modular architecture and W&B Weave integration for tracking evaluations.

Changes:

Removes legacy evaluation scripts (code_review_tool_evaluator.py, code_review_tool_evaluator_report.py) and experimental files
Introduces new modular tools for patch summarization, suggestion filtering, and comment matching
Adds Jupyter notebooks for dataset creation and evaluation using W&B Weave
Refactors CodeReviewTool to use Protocol-based dependency injection for better testability
Updates platform base classes to accept both str and int for patch_id parameters

Reviewed changes

Copilot reviewed 21 out of 22 changed files in this pull request and generated 14 comments.

Show a summary per file

File	Description
scripts/code_review_tool_evaluator_report.py	Removed legacy evaluation report generator
scripts/code_review_tool_evaluator.py	Removed legacy evaluation script (613 lines)
experiments/review_helper_modify_filtering_step.ipy	Removed experimental filtering modification script
requirements.txt	Added weave>=0.50.0 for evaluation tracking
notebooks/code_review_evaluation.ipynb	New notebook for running W&B Weave evaluations
notebooks/code_review_create_dataset.ipynb	New notebook for creating evaluation datasets
bugbug/tools/suggestion_filtering/prompts.py	Extracted filtering prompts to dedicated module
bugbug/tools/suggestion_filtering/agent.py	New modular suggestion filtering tool
bugbug/tools/patch_summarization/prompts.py	Extracted summarization prompts to dedicated module
bugbug/tools/patch_summarization/agent.py	New modular patch summarization tool
bugbug/tools/comment_matching/prompts.py	New prompts for LLM-based comment matching
bugbug/tools/comment_matching/agent.py	New tool for matching generated vs ground truth comments
bugbug/tools/code_review/scorer.py	New Weave scorers for evaluation metrics
bugbug/tools/code_review/utils.py	Refactored to work with structured comment objects
bugbug/tools/code_review/prompts.py	Removed prompts moved to specialized modules
bugbug/tools/code_review/agent.py	Refactored to use Protocol-based dependencies
bugbug/tools/base.py	Simplified by removing version property and print method
bugbug/tools/core/platforms/base.py	Updated signature to accept str or int for patch_id
bugbug/tools/core/platforms/phabricator.py	Updated signature to accept str or int for patch_id
bugbug/tools/core/platforms/swarm.py	Updated signature to accept str or int for patch_id
bugbug/code_search/searchfox_api.py	Made get_file parameter optional with default implementation
bugbug/code_search/mozilla.py	Made get_file parameter optional with fallback

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

bugbug/tools/comment_matching/agent.py

bugbug/tools/suggestion_filtering/agent.py

bugbug/tools/code_review/agent.py

bugbug/tools/patch_summarization/agent.py

bugbug/tools/comment_matching/agent.py

bugbug/tools/patch_summarization/prompts.py

bugbug/tools/comment_matching/agent.py

bugbug/tools/patch_summarization/agent.py

bugbug/tools/suggestion_filtering/agent.py

bugbug/tools/comment_matching/agent.py

Introduces a run_by_diff_id method that retrieves a patch by diff ID from review_data and runs the review process.

Eliminated the abstract version property from GenerativeModelTool and removed the version attribute from CodeReviewTool since it is not used.

Moved suggestion filtering logic from code_review/agent.py to a new suggestion_filtering module. Introduced SuggestionFilteringTool for filtering review comments, updated CodeReviewTool to use the new filterer, and relocated related prompt templates. This improves modularity and separation of concerns for suggestion filtering.

Wrapped comments and rejected examples in <comments-to-filter> and <rejected-examples> tags to improve prompt structure and clarity.

It will be replaced with W&B Weave evaluation pipeline

marco-c · 2026-01-12T12:09:44Z

Did Refactor filtering to return indices instead of full comments improve filtering results?

suhaibmujahid · 2026-01-12T14:10:29Z

Did Refactor filtering to return indices instead of full comments improve filtering results?

The main goal was to simplify the tracking. Now it is an independent tool, so we can evaluate it in isolation.

suhaibmujahid added 3 commits December 24, 2025 14:10

Remove unused _print_answer method and related calls

2fb736c

We can have better tracking with W&B Weave

Refactor FunctionSearchMozilla file retrieval logic

9fe827b

Improve the factory method to create a CodeReviewTool with default co…

bd85288

…mponents

suhaibmujahid marked this pull request as ready for review January 10, 2026 22:34

suhaibmujahid requested review from Copilot and marco-c January 10, 2026 22:34

Copilot started reviewing on behalf of suhaibmujahid January 10, 2026 22:35 View session

Copilot AI reviewed Jan 10, 2026

View reviewed changes

suhaibmujahid added 13 commits January 11, 2026 21:25

Add run_by_diff_id method to CodeReviewTool

95678ef

Introduces a run_by_diff_id method that retrieves a patch by diff ID from review_data and runs the review process.

Support providing the patch summery when generating suggestions

3794dad

Update get_patch_by_id to accept str or int patch_id

8d64fe6

Remove version property from GenerativeModelTool

2458610

Eliminated the abstract version property from GenerativeModelTool and removed the version attribute from CodeReviewTool since it is not used.

Refactor patch summarization into separate tool

c9923f3

Add XML-like tags to filtering prompt template

b3d4942

Wrapped comments and rejected examples in <comments-to-filter> and <rejected-examples> tags to improve prompt structure and clarity.

Refactor filtering to return indices instead of full comments

6bb85f1

Add comment matching tool

f5d3dd1

Remove code review evaluation scripts

4856ffb

It will be replaced with W&B Weave evaluation pipeline

Add code review evaluation pipeline

78af5a2

Refactor create method for CodeReviewTool to use classmethod

4699a2c

Fix typo in summarization prompt template

595cb8c

suhaibmujahid force-pushed the improve-revew-helper-4 branch from 7e1ca6b to 595cb8c Compare January 12, 2026 02:30

marco-c approved these changes Jan 12, 2026

View reviewed changes

suhaibmujahid mentioned this pull request Jan 12, 2026

Potential circular import issue in SuggestionFilteringTool #5610

Open

suhaibmujahid merged commit c3b6696 into mozilla:master Jan 12, 2026
6 checks passed

suhaibmujahid deleted the improve-revew-helper-4 branch January 12, 2026 14:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[code_review] Misc improvements part 4#5588

[code_review] Misc improvements part 4#5588
suhaibmujahid merged 16 commits intomozilla:masterfrom
suhaibmujahid:improve-revew-helper-4

suhaibmujahid commented Jan 5, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

marco-c commented Jan 12, 2026

Uh oh!

suhaibmujahid commented Jan 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

suhaibmujahid commented Jan 5, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

marco-c commented Jan 12, 2026

Uh oh!

suhaibmujahid commented Jan 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants